## Warning in library(package, lib.loc = lib.loc, character.only = TRUE,
## logical.return = TRUE, : there is no package called 'kableExtra'

Load the data

We load two data files: air_travel_MB.csv contains the network description and airport_codes_CAN.csv, which has some information about all airfields in Canada.

The data in air_travel_MB is pre-processed. It represents an extreme simplification of a much more complete dataset giving, for a given time period, the total number of travellers between 3734 airports worldwide along trips including up to 5 intermediate stops.

What I did is to keep only the airports in Manitoba. Any airport outside of Manitoba is set as RoW (rest of the world). Also, the existing multi-leg trips within Manitoba are “decomposed”: if the data had 10 people flying, say, YWG\(\to\)YQD\(\to\)YTH (Winnipeg to The Pas to Thompson), these would appear as 10 people flying from YWG to YQD and 10 people flying from YQD to YTH.

Flights to and from RoW are decomposed the same way as flights within Manitoba, increasing the number of trips between RoW and YWG. Indeed, 5 passengers flying RoW\(\to\)YWG\(\to\)YQD and 5 passengers flying RoW\(\to\)YWG\(\to\)YBR (Brandon) will show as 10 passengers flying RoW to YWG, together with 5 passengers flying YWG\(\to\)YQD and 5 passengers flying YWG\(\to\)YBR.

First, here is what the flight data looks like. (I am using knitr::kable to make the table more readable.)

orig dest vol
RoW YBR 2002
RoW YTH 18
RoW YWG 157323
RoW YYQ 1474
XLB RoW 766
XLB XTL 114

The airport data, filtered to only retain MB locations, looks as follows.

Airport_ID Identifier Type Name Latitude Longitude Elevation_Ft Continent ISO_Country_Code ISO_Region Municipality Is_Scheduled_Service GPS_Code IATA_Code Local_Code Home_Link Wikipedia_Link Keywords
35391 CA-0006 closed RCAF Station Carberry 49.87210 -99.39730 NA NA CA CA-MB Carberry FALSE NA NA NA NA https://en.wikipedia.org/wiki/RCAF_Station_Carberry NA
39668 CA-0022 closed Anama Bay-Dauphin River Airport 51.96267 -98.13684 NA NA CA CA-MB NA FALSE NA NA NA NA NA NA
39677 CA-0031 closed Austin Airport 49.93333 -98.91666 NA NA CA CA-MB NA FALSE NA NA NA NA NA NA
39686 CA-0040 closed Beausejour Airport 50.13697 -96.20822 NA NA CA CA-MB Beausejour FALSE NA NA NA NA NA NA
39694 CA-0048 closed Bethany Airport 50.35000 -99.75000 NA NA CA CA-MB NA FALSE NA NA NA NA NA NA
39696 CA-0050 closed Bird Airport 56.50000 -94.21667 NA NA CA CA-MB NA FALSE NA NA NA NA NA NA

Data wrangling remark

Note that there is an alternative to dplyr (the package that allows us to use the pipe %>%); it is indeed possible to treat data frames as SQL tables and run SQL queries on them. This may make more sense to some of you. For illustration, to create the table of movements, I used the following command (which can run, since it is not destructive of what we have already done).

query = "SELECT orig,dest,SUM(vol) AS vol
FROM air_travel_MB
GROUP BY orig,dest
ORDER BY orig,dest"
air_travel_MB = sqldf::sqldf(query)

Finish preparing the data: the following code (chunk not shown) adds the latitude and longitudes of all the airports in the air_travel_MB data set. Note that we give RoW a location: we will show it as an additional vertex in the graph. (I picked somewhere in Ontario, close to but not in Manitoba.)

IATA lon lat
RoW -90.0000 49.5000
XLB -101.4690 58.6175
XSI -98.9072 56.7928
XTL -98.5122 58.7061
YBR -99.9519 49.9100
YBT -101.6790 57.8894

We are now in a position to make the graph and plot it in simple form. We add the volume as the weight of the arcs.

We also make a subgraph without RoW to focus on the properties really specific to Manitoba.

Making nicer plots

Before we explore the properties of the graph, let us spend a little time on visualisation. This is not to be neglected: nice plots help more than you think.

The next piece of code does the following: - Redefine the output size (does it once for all, need to modify if need be) - Uses the library raster to read in the shape file for Canada - Selects Manitoba from the Canada shape file - Plots Manitoba - Plots the graph

Let us add the degree information as the size of the vertices.

Now let us take a look at the arcs. Maybe change their thickness to reflect the volumes along the arcs?

My guess: some of the arcs are so massive that this is all we can see. So let us renormalise the values of the weights. First, what do they look like now?

##   [1]     18     47     48     48     50     98     99    114    116    174
##  [11]    176    208    208    212    242    276    278    279    288    288
##  [21]    288    292    292    309    337    337    343    344    364    366
##  [31]    368    369    372    383    399    401    401    406    447    447
##  [41]    449    451    451    462    529    565    567    582    592    644
##  [51]    650    652    652    682    684    694    696    701    702    713
##  [61]    714    758    766    780    806    848    854   1036   1050   1167
##  [71]   1172   1242   1270   1279   1342   1349   1394   1462   1464   1474
##  [81]   1485   1543   1565   1607   1638   1750   1857   2002   2230   2518
##  [91]   2582   2877   2899   2910   2924   3037   3157   3355   3355   3365
## [101]   4246   4571   5770   5789   7430  11464 157323 183256

So, yes, clearly a bit too much variation. If we scale linearly by multiplying by a constant, we will most likely lose the information on the small weights. Just to check that we are intuiting correctly, though, let us try once doing this.

What we suspected, indeed: to make the arcs with the largest volume not too large, we had to essentially get rid of the weight of the arcs with small volumes. So what we want to do is to process the values through a function that would make the smaller values larger relative to the larger ones or the larger values closer to the smaller one, then scale the result. For this, it is good to look at what the values look like.

One way around this could be to use a so-called sigmoid function. This is a function of the following form: \[ f(x) = \dfrac{1}{1+e^{-cx}}, \] where \(\mathbb{R}\ni c>0\) is a parameter. This function has the following properties: \(\lim_{x\to-\infty}f(x)=0\) and \(\lim_{x\to\infty}f(x)=1\). By playing with the value of \(c\), one can set the beahviour of \(f\). Let us see how different values of \(c\) influence the value of the points. (The viridis palette used starts with yellow and goes towards green and then purple.)

That last one looks quite nice. We most likely will have to multiply by a factor, so we compute the values in advance.

Well, after all this, I am not convinced we achieved between than we had done above but just a linear scaling. So when we want a a good plot, we’ll come back to that solution.

Central points, centre, periphery

For simplicity, I only show the out-degree case. And just to show how easily it’s done: let’s colour the vertices in the centre red and those in the periphery blue.

## Central points:
##  [1] "YFO" "YGO" "YGX" "YOH" "YQD" "YTH" "YWG" "YYQ" "ZGI" "ZTM"
## Periphery:
##  [1] "XLB" "XSI" "XTL" "YBR" "YBT" "YBV" "YCR" "YDN" "YIV" "YNE" "YRS" "YST"
## [13] "ZAC"

Girth

Recall that girth in igraph assumes the graph is undirected.

## $girth
## [1] 3
## 
## $circle
## + 3/24 vertices, named, from 3c01b07:
## [1] XLB RoW XTL
## $girth
## [1] 3
## 
## $circle
## + 3/23 vertices, named, from 5e4decc:
## [1] XTL XLB YTH

Density

What fraction of possible arcs/edges are present?

## [1] 0.1956522
## [1] 0.1600791

Cliques

## Warning in largest_cliques(G_MB): At
## vendor/cigraph/src/cliques/maximal_cliques_template.h:219 : Edge directions are
## ignored for maximal clique calculation.
## [[1]]
## + 5/23 vertices, named, from 5e4decc:
## [1] ZGI YGO YWG YTH YOH

Centrality

Degree-centrality

## XLB XSI XTL YBR YBT YBV YCR YDN YFO YGO YGX YIV YNE YOH YQD YRS YST YTH YWG YYQ 
##   2   1   2   2   1   1   2   2   3   4   2   2   2   4   3   4   2  14  17   2 
## ZAC ZGI ZTM 
##   1   5   3
## XLB XSI XTL YBR YBT YBV YCR YDN YFO YGO YGX YIV YNE YOH YQD YRS YST YTH YWG YYQ 
##   2   1   2   2   1   1   2   2   3   4   2   3   2   4   3   4   1  14  17   2 
## ZAC ZGI ZTM 
##   1   5   3
## XLB XSI XTL YBR YBT YBV YCR YDN YFO YGO YGX YIV YNE YOH YQD YRS YST YTH YWG YYQ 
##   4   2   4   4   2   2   4   4   6   8   4   5   4   8   6   8   3  28  34   4 
## ZAC ZGI ZTM 
##   2  10   6

K-nearest neighbours

For info, the return values of the function knn are - knn: A numeric vector giving the average nearest neighbor degree for all vertices in vids. - knnk: A numeric vector, its length is the maximum (total) vertex degree in the graph. The first element is the average nearest neighbor degree of vertices with degree one, etc.

## $knn
##       XLB       XSI       XTL       YBR       YBT       YBV       YCR       YDN 
## 16.000000 28.000000 16.000000 19.000000 28.000000 34.000000 19.000000 19.000000 
##       YFO       YGO       YGX       YIV       YNE       YOH       YQD       YRS 
## 22.666667 20.000000 31.000000 17.400000 19.000000 20.000000 22.666667 13.750000 
##       YST       YTH       YWG       YYQ       ZAC       ZGI       ZTM 
## 24.333333  7.142857  6.705882 31.000000 28.000000 17.200000 23.333333 
## 
## $knnk
##  [1]       NaN 29.500000 24.333333 21.250000 17.400000 22.888889       NaN
##  [8] 17.916667       NaN 17.200000       NaN       NaN       NaN       NaN
## [15]       NaN       NaN       NaN       NaN       NaN       NaN       NaN
## [22]       NaN       NaN       NaN       NaN       NaN       NaN  7.142857
## [29]       NaN       NaN       NaN       NaN       NaN  6.705882
## $knn
##       XLB       XSI       XTL       YBR       YBT       YBV       YCR       YDN 
## 16.000000 28.000000 16.000000 19.000000 28.000000 34.000000 19.000000 19.000000 
##       YFO       YGO       YGX       YIV       YNE       YOH       YQD       YRS 
## 22.666667 20.000000 31.000000 15.000000 19.000000 20.000000 22.666667 13.750000 
##       YST       YTH       YWG       YYQ       ZAC       ZGI       ZTM 
## 34.000000  7.142857  6.705882 31.000000 28.000000 17.200000 23.333333 
## 
## $knnk
##  [1] 30.400000 21.250000 20.916667 17.916667 17.200000       NaN       NaN
##  [8]       NaN       NaN       NaN       NaN       NaN       NaN  7.142857
## [15]       NaN       NaN  6.705882
## $knn
##       XLB       XSI       XTL       YBR       YBT       YBV       YCR       YDN 
## 16.000000 28.000000 16.000000 19.000000 28.000000 34.000000 19.000000 19.000000 
##       YFO       YGO       YGX       YIV       YNE       YOH       YQD       YRS 
## 22.666667 20.000000 31.000000 21.000000 19.000000 20.000000 22.666667 13.750000 
##       YST       YTH       YWG       YYQ       ZAC       ZGI       ZTM 
## 19.500000  7.142857  6.705882 31.000000 28.000000 17.200000 23.333333 
## 
## $knnk
##  [1] 29.500000 21.050000 22.888889 17.916667 17.200000       NaN       NaN
##  [8]       NaN       NaN       NaN       NaN       NaN       NaN  7.142857
## [15]       NaN       NaN  6.705882

Let us plot the values in the graph, say, for the in-knn.

Coreness

## XLB XSI XTL YBR YBT YBV YCR YDN YFO YGO YGX YIV YNE YOH YQD YRS YST YTH YWG YYQ 
##   4   2   4   4   2   2   4   4   6   8   4   4   4   8   6   6   3   8   8   4 
## ZAC ZGI ZTM 
##   2   8   6
## XLB XSI XTL YBR YBT YBV YCR YDN YFO YGO YGX YIV YNE YOH YQD YRS YST YTH YWG YYQ 
##   2   1   2   2   1   1   2   2   3   4   2   2   2   4   3   3   1   4   4   2 
## ZAC ZGI ZTM 
##   1   4   3
## XLB XSI XTL YBR YBT YBV YCR YDN YFO YGO YGX YIV YNE YOH YQD YRS YST YTH YWG YYQ 
##   2   1   2   2   1   1   2   2   3   4   2   2   2   4   3   3   2   4   4   2 
## ZAC ZGI ZTM 
##   1   4   3

Let us look at the out-coreness, for instance, and colour the vertices as a function of that.

Betweenness

##        RoW        XLB        XSI        XTL        YBR        YBT        YBV 
##  63.416667   0.000000   0.000000   0.000000   2.166667   0.000000   0.000000 
##        YCR        YDN        YFO        YGO        YGX        YIV        YNE 
##   0.000000   0.000000   0.000000   0.000000   0.000000   0.500000   0.000000 
##        YOH        YQD        YRS        YST        YTH        YWG        YYQ 
##   0.000000   0.000000   2.666667   0.000000 191.916667 250.166667   0.000000 
##        ZAC        ZGI        ZTM 
##   0.000000   5.083333   3.083333
##        XLB        XSI        XTL        YBR        YBT        YBV        YCR 
##   0.000000   0.000000   0.000000   0.000000   0.000000   0.000000   0.000000 
##        YDN        YFO        YGO        YGX        YIV        YNE        YOH 
##   0.000000   0.000000   0.000000   0.000000   0.500000   0.000000   0.000000 
##        YQD        YRS        YST        YTH        YWG        YYQ        ZAC 
##   0.000000   2.666667   0.000000 211.666667 280.166667   0.000000   0.000000 
##        ZGI        ZTM 
##   6.000000   4.000000

Closeness

First, the unweighted case, i.e., using the geodesic distance.

##     XLB     XSI     XTL     YBR     YBT     YBV     YCR     YDN     YFO     YGO 
## 0.02000 0.01961 0.02000 0.02128 0.01961 0.02083 0.02128 0.02128 0.02439 0.02500 
##     YGX     YIV     YNE     YOH     YQD     YRS     YST     YTH     YWG     YYQ 
## 0.02381 0.02174 0.02128 0.02500 0.02439 0.02222 0.02128 0.03333 0.03704 0.02381 
##     ZAC     ZGI     ZTM 
## 0.01961 0.02564 0.02439
##     XLB     XSI     XTL     YBR     YBT     YBV     YCR     YDN     YFO     YGO 
## 0.02000 0.01961 0.02000 0.02128 0.01961 0.02083 0.02128 0.02128 0.02439 0.02500 
##     YGX     YIV     YNE     YOH     YQD     YRS     YST     YTH     YWG     YYQ 
## 0.02381 0.02174 0.02128 0.02500 0.02439 0.02222 0.02083 0.03333 0.03704 0.02381 
##     ZAC     ZGI     ZTM 
## 0.01961 0.02564 0.02439
##     XLB     XSI     XTL     YBR     YBT     YBV     YCR     YDN     YFO     YGO 
## 0.02000 0.01961 0.02000 0.02128 0.01961 0.02083 0.02128 0.02128 0.02439 0.02500 
##     YGX     YIV     YNE     YOH     YQD     YRS     YST     YTH     YWG     YYQ 
## 0.02381 0.02128 0.02128 0.02500 0.02439 0.02222 0.02128 0.03333 0.03704 0.02381 
##     ZAC     ZGI     ZTM 
## 0.01961 0.02564 0.02439

We can also compute closeness using weighted arcs, i.e., using the travel volumes. Geodesic distance shown for comparison. We save the results for future manipulation.

##        XLB        XSI        XTL        YBR        YBT        YBV        YCR 
## 0.02000000 0.01960784 0.02000000 0.02127660 0.01960784 0.02083333 0.02127660 
##        YDN        YFO        YGO        YGX        YIV        YNE        YOH 
## 0.02127660 0.02439024 0.02500000 0.02380952 0.02127660 0.02127660 0.02500000 
##        YQD        YRS        YST        YTH        YWG        YYQ        ZAC 
## 0.02439024 0.02222222 0.02127660 0.03333333 0.03703704 0.02380952 0.01960784 
##        ZGI        ZTM 
## 0.02564103 0.02439024
##          XLB          XSI          XTL          YBR          YBT          YBV 
## 3.556061e-05 4.171882e-05 3.788453e-05 4.025603e-05 3.989150e-05 2.928172e-05 
##          YCR          YDN          YFO          YGO          YGX          YIV 
## 3.762935e-05 3.726893e-05 3.605163e-05 5.860634e-05 2.420721e-05 3.335557e-05 
##          YNE          YOH          YQD          YRS          YST          YTH 
## 2.986323e-05 6.466632e-05 4.283206e-05 5.351887e-05 2.247494e-05 6.413134e-05 
##          YWG          YYQ          ZAC          ZGI          ZTM 
## 5.218934e-05 4.675738e-05 4.381353e-05 6.568576e-05 5.549390e-05

To prepare for plots, let us look at the ranges of values.

## [1] 0.01960784 0.03703704

## [1] 2.247494e-05 6.568576e-05

So if we multiply the first values by 10,000 and the second ones by 1e7, we should get good ranges of variation.

## [1] 196.0784 370.3704
## [1] 224.7494 656.8576

Indeed, so that’s what we do.

One interesting remark here: closeness radically changes when considering the unweighted and weighted version. Just to confirm this, let us plot the two graphs side by side.

So Thompson (YTH) is has higher closeness centrality than Winnipeg (YWG) when passenger volumes are considered. Does the situation change if we recall that Winnipeg is quite connected to the rest of the world? We prepare the computations and the plots as we had done earlier.

## [1] 0.01960784 0.03571429
## [1] 2.144864e-05 6.247267e-05

Ranges are quite similar to those earlier, so we can probably just do the same as before. We will plot the 4 figures this time. First row is as before, second row is with rest of the world information thrown in.

Actually, things do not change, except that RoW becomes important as well.